Welcome

5 NVMe Improvements and Why they matter

The NVMe working group has been highly prolific. Recent spec updates are significant.
We grabbed the top 5 NVMe Spec improvements and offered insights and examples on their business relevance; they are nothing short of revolutionary.

what it does,
how it works

business relevance,
why it matters

NVMe Sets

NVMe Sets serve to isolate Noisy Neighboors

by separating and allocating NAND media so workloads (or containers) using one NVM Set does not impact other workloads on other sets.

Solving Unpredictable latency in multi-tenant uses

Noisy neighbors disallow cloud service providers to offer container services on shared hardware with a service level agreement. NVMe sets solve this problem.

Picture servers running cloud containers. Now picture one of those containers getting blasted with writes. Other containers on the same SSD stop responding

The cloud business impact is obvious. Cloud Service provides (also private cloud operations) cannot offer containers on shared hardware, with a guarantee quality of service.

this noisy neighbor problem is so pronounced that cloud companies have embarked on projects to entirely re-write SSD firmware

NVMe Sets serve to isolate Noisy Neighbors

NVMe deterministic IO

Eliminate read latency outliers caused by SSD housekeeping

A chunk of time is allocated to deliver predictable read latency (this is deterministic mode).

another chunk of time is allocated for housekeeping and read latency is unpredictable (this is non-deterministic).

NVMe deterministic IO gets interesting when applied with Multiple SSD

Solving Unpredictable latency in cloud database usesPicture a cloud database which spans a dozen servers, each server has a dozen SSDs. Picture a database query that hits these dozen servers and SSDs. The cloud database query completes only as fast as the slowest response.If SSDs happen to be otherwise busy with housekeeping … then ouch. And when hundreds of SSDs are involved, the probability of encountering an SSD in housekeeping mode is magnified.

When IO determinism is coordinated across a group of SSDs (see picture), then SSDs in deterministic mode are employed while SSDs in non-deterministic mode are conveniently omitted from service.

This remedies the “unpredictable” latency problem.

NVMe over fabrics

decouples storage from servers without high penalties of latency. The latest spec adds TCP for routing, more viable for replication target

Decouple Compute from Storage without performance penalties. AND TCP route-ability for edge storage for replication, media caching, IOT, AI/MLDatacenters are dynamic places.
– lots of storage and very little compute.  OR
– lots of compute and very little storage.
Led to disaggregated storage.Now, NVMe over fabrics allows disaggregated storage and with performance of local NVMe SSD.

TCP route-ability makes a NVMe over fabrics storage target acceptable for remote replication.

Why exactly, should we wrap a server around a group of SSDs for replication?

NVMe Management Interface

NVMe Management interface (MI) has optional PCI commands allow specific control of device reads/writes, significant when used with NVME over fabrics in Points of Presence (POPs)

Significantly better storage at the edge for media edge caching, IOT, AI/MLthe above NVMe over fabrics, used in conjunction with NVMe management interface opens even more use cases.  The MI spec allows automated information gathering, alarming, discovery, and remote configuration. And closer examination of NVMe 1.4 spec, Section 7 shows PCI configuration read/write and similar commands.This makes it possible to use NVMe over fabrics devices as POP Endpoints and viable for media edge caching (aka. The Netflix use case); also for orchestration where IOT end point information will be captured, aggregated, and loaded into Big data or AI/ML systems.
NVMe MI turns on all of these capabilities.

NVMe Namespace Sharing

NVMe namespace sharing is exactly what it sounds like.

Create a namespace. Allow two or more completely independent PCI express paths between a single host and namespace.

Significantly better storage at the edge for media edge caching, IOT, AI/MLhaving one PCI express path writing to a device and having another PCI express path reading from that same device has serious value when used for end-point POP media caching, IOT or replication use cases.

Credit where credit is due …

  1. The NVMe working group has done a stellar job … https://nvmexpress.org/
  2. Facebook published this _critical_ information, leading to the standardization of NVMe sets and NVMe deterministic IO. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180807_INVT-102A-1_Petersen.pdf
  3. Kazan Networks (now part of WDC) pioneered NVMe over Fabrics with the Onyx chip and the Fuji chip. https://kazan-networks.com/

Leave a Reply

Your email address will not be published. Required fields are marked *